AITopics | token representation

White-Box Transformers via Sparse Rate Reduction

Neural Information Processing SystemsApr-25-2026, 16:03:30 GMT

In this paper, we contend that the objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a mixture of low-dimensional Gaussian distributions supported on incoherent subspaces. The quality of the final representation can be measured by a unified objective function called sparse rate reduction. From this perspective, popular deep networks such as transformers can be naturally viewed as realizing iterative schemes to optimize this objective incrementally. Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens. This leads to a family of white-box transformer-like deep network architectures which are mathematically fully interpretable. Despite their simplicity, experiments show that these networks indeed learn to optimize the designed objective: they compress and sparsify representations of large-scale real-world vision datasets such as ImageNet, and achieve performance very close to thoroughly engineered transformers such as ViT.

artificial intelligence, machine learning, representation, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

0e0157ce5ea15831072be4744cbd5334-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 13:54:38 GMT

A.1 Dataset Details & Evaluation Metrics As stated earlier, the main application of Extreme Multi-label Text Classification is in e-commerce - product recommendation and dynamic search advertisement - and in document tagging, where the objective of an algorithm is to correctly recommend/advertise among the top-k slots. Thus, for evaluation of the methods, we use precision at k (denoted by P@k), and its propensity scored variant (denoted by PSP@k) [17]. These are standard and widely used metrics by the XMC community [4]. Since P@k treats all the labels equally, it doesn't reveal the performance of the model on tail labels. However, because of the long-tailed distribution in XMC datasets, one of the main challenges is to predict tail labels correctly, which may be more valuable and informative compared to head classes.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.53)
Information Technology > Artificial Intelligence > Natural Language (0.35)

Add feedback

f4fba41b554f9aaa013c4062a1c40518-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 17:13:15 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

fde1a69a5b6e554b2f1f727197d2651d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 03:41:28 GMT

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)
(3 more...)

Add feedback

Improved Generation of Adversarial Examples Against Safety-aligned LLMs Qizhang Li

Neural Information Processing SystemsFeb-17-2026, 10:57:52 GMT

Yiwen Guo leads the project and serves as the corresponding author.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Heilongjiang Province > Harbin (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

57a9b97477b67936298489e3c1417b0a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 01:58:23 GMT

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

871cae8f599cb8bbfcb0f58fe1af95ad-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 12:16:51 GMT

contrastive search, representation, simctg, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(8 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

DoVisionTransformersSeeLikeConvolutional NeuralNetworks?

Neural Information Processing SystemsFeb-9-2026, 01:46:52 GMT

Convolutional neural networks (CNNs) haveso far been the de-facto model for visualdata. Recent workhasshownthat(Vision)Transformer models (ViT)can achieve comparable or even superior performance on image classification tasks.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

1e118ba9ee76c20df728b42a35fb4704-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 16:16:26 GMT

architecture, representation, section 2, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model

Li, Ao, Duan, Yuxiang, Zhang, Jinghui, Ma, Congbo, Xie, Yutong, Carneiro, Gustavo, Yaqub, Mohammad, Wang, Hu

arXiv.org Artificial IntelligenceNov-18-2025

Large Vision-Language Models (LVLMs) have advanced multimodal learning but face high computational costs due to the large number of visual tokens, motivating token pruning to improve inference efficiency. The key challenge lies in identifying which tokens are truly important. Most existing approaches rely on attention-based criteria to estimate token importance. However, they inherently suffer from certain limitations, such as positional bias. In this work, we explore a new perspective on token importance based on token transitions in LVLMs. We observe that the transition of token representations provides a meaningful signal of semantic information. Based on this insight, we propose TransPrune, a training-free and efficient token pruning method. Specifically, TransPrune progressively prunes tokens by assessing their importance through a combination of Token Transition Variation (TTV)-which measures changes in both the magnitude and direction of token representations-and Instruction-Guided Attention (IGA), which measures how strongly the instruction attends to image tokens via attention. Extensive experiments demonstrate that TransPrune achieves comparable multimodal performance to original LVLMs, such as LLaVA-v1.5 and LLaVA-Next, across eight benchmarks, while reducing inference TFLOPs by more than half. Moreover, TTV alone can serve as an effective criterion without relying on attention, achieving performance comparable to attention-based methods. The code will be made publicly available upon acceptance of the paper at https://github.com/liaolea/TransPrune.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.2063

Genre: Research Report > New Finding (0.46)

Technology: